Condensing Sentences for Subtitle Generation

نویسندگان

  • Prokopis Prokopidis
  • Vassia Karra
  • Aggeliki Papagianopoulou
  • Stelios Piperidis
چکیده

Text condensation aims at shortening the length of an utterance without losing essential textual information. In this paper, we report on the implementation and preliminary evaluation of a sentence condensation tool for Greek using a manually constructed table of 450 lexical paraphrases, and a set of rules that delete syntactic subtrees that carry minor semantic information. Evaluation on two sentence sets show promising results regarding grammaticality and semantic acceptability of compressed versions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentence Compression For Automatic Subtitling

This paper investigates sentence compression for automatic subtitle generation using supervised machine learning. We present a method for sentence compression as well as discuss generation of training data from compressed Finnish sentences, and different approaches to the problem. The method we present outperforms state-of-the-art baseline in both automatic and human evaluation. On real data, 4...

متن کامل

OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles

We present a new major release of the OpenSubtitles collection of parallel corpora. The release is compiled from a large database of movie and TV subtitles and includes a total of 1689 bitexts spanning 2.6 billion sentences across 60 languages. The release also incorporates a number of enhancements in the preprocessing and alignment of the subtitles, such as the automatic correction of OCR erro...

متن کامل

Discourse Constraints for Document Compression

Sentence compression holds promise for many applications ranging from summarization to subtitle generation. The task is typically performed on isolated sentences without taking the surrounding context into account, even though most applications would operate over entire documents. In this article we present a discourse-informed model which is capable of producing document compressions that are ...

متن کامل

Modelling Compression with Discourse Constraints

Sentence compression holds promise for many applications ranging from summarisation to subtitle generation. The task is typically performed on isolated sentences without taking the surrounding context into account, even though most applications would operate over entire documents. In this paper we present a discourse informed model which is capable of producing document compressions that are co...

متن کامل

Stelios Piperidis 1,2 Infrastructure for a Multilingual Subtitle Generation System

The expansion of digital television and the increasing demand to manipulate audiovisual content underlie the need for tools and systems that will automate the multilingual subtitle generation process. In this setting the MUSA project aims at providing a system which combines speech recognition, advanced text analysis, and machine translation to help generate multilingual subtitles. In its curre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008